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\  ABSTRACT 

The  bootstrap  estimator  of  the  asymptotic  covariance  matrix  of  a  function  of  sample 
means  or  sample  quantiles  is  inconsistent  in  some  situations.  A  modified  bootstrap  estimator  is 
proposed  and  shown  to  be  consistent  under  weak  conditions.  A  simulation  study  shows  that  in 
terms  of  finite-sample  performance,  the  improvement  of  this  modification  is  substantial.  The 
computation  of\w  modified  bootstrap  estimator  is  much  easier  and  cheaper  than  that  of  the 
estimator  based  on  the  quantiles  of  the  bootstrap  distribution.  Wtrshow  by  simulation  that 
with  the  same  number  of  bootstrap  replicates  (in  bootstrap  Monte  Carlo  approximation),  the 
modified  bootstrap  estimator  is  more  accurate  than  the  estimator  based  on  the  interquartile 
range  of  the  bootstrap  distribution.  ) 
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1.  INTRODUCTION 


Let  p.  be  an  unknown  characteristic  of  a  population  distribution  F .  We  focus  on  the  fol¬ 
lowing  two  cases  which  are  frequently  encountered  in  practice:  (i)  F  is  -variate  and 

p.  =  jjtdF,  the  mean  of  F\  (ii)  F  is  univariate  and  p,  =  (  Q(P\) . Q(Pk) where  Q(pj)  is  the 

-quantile  of  F .  The  quantity  of  interest  is  6=g(p),  where  g  is  a  fixed  function  from  R*  to 
R*". 

LetXi,...,  Xn  be  independent  and  identically  distributed  (i.i.d.)  samples  from  F.  A  point 
estimator  of  6  in  case  (i)  is  ^  =  g  (X),  where  X  ==  is  the  sample  mean.  For  case  (ii), 

let  ^(pj)  be  the  sample  -quantile  based  on  X^,...,  X„  and  Q  =  (  Q(Pi),...,  Q(Pk)  )'■  A  point 
estimator  of  9  is  then  d  =  g  (Q). 

It  is  well  known  that  under  reasonable  conditions  n'^’(§-6)  converges  in  law  (as  the  sam¬ 
ple  size  n->oo)  to  an  m -variate  normal  distribution  with  mean  zero  and  covariance  matrix  £. 

The  £  is  called  the  asymptotic  covariance  matrix  of  6  and  is  usually  unknown.  For  assessing 

the  accuracy  of  the  point  estimator  we  need  an  estimator  of  £.  Obtaining  a  good  estimator 
of  £  is  also  crucial  for  making  other  statistical  inferences  such  as  testing  hypothesis  and  setting 
confidence  region  for  6. 

Efron  (1979)  introduced  a  bootstrap  method  for  variance  estimation.  Let  Xj,..., be 
i.i.d.  samples  firom  {  Xj,...,  X„  },  X*=  and  Q*  be  the  k-vector  of  sample  quantiles 

based  on  X J ,...,  X^.  Let  §*  =  g (X* )  if  §  =  g(X)  and  8*=  g (Q* )  if  S  =  g (Q).  The  bootstrap 
estimator  of  the  asymptotic  covariance  matrix  £  of  §  is  then 

±1,  =  /jVar*(§*)  =  nE,(§*-E*0*  )(§*-£.§*)',  d-l) 

where  £•  and  Viar*  are  the  expectation  and  variance  taken  under  the  bootstrap  distribution. 

An  essential  theoretical  justification  of  a  variance  estimator  is  its  consistency.  When  g  is 
the  identity  function,  the  bootstrap  estimator  £fc  is  consistent.  For  the  case  of  0  =  X , 

according  to  the  strong  law  of  large  numbers.  For  the  case  of  d  =  ^,  the  consistency  of  £^ 
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was  proved  by  Babu  (1986)  under  some  conditions  (see  Theorem  2). 

A 

However,  even  for  smooth  differentiable  function  g,  the  consistency  of  is  not 
guaranteed.  A  counter-example  is  given  in  Section  2.  To  circumvent  the  inconsistency  of  the 
bootstrap  variance  estimator,  we  propose  a  modified  bootstrap  variance  estimator.  Description 
of  this  modification  is  given  in  Section  2.  The  consistency  of  the  modified  bootstrap  variance 
estimator  for  the  cases  of  fimctions  of  sample  means  and  sample  quantiles  is  established  (Sec¬ 
tion  2.3).  Variance  estimators  based  on  the  quantiles  of  the  bootstrap  distribution,  such  as  a 
multiple  of  the  interquartile  range  of  the  bootstrap  distribution,  are  also  consistent.  But  the 
computation  of  our  modified  bootstrap  estimator  is  much  easier  and  cheaper  than  that  of  the 
bootstrap  quantiles.  In  Section  3,  simulation  results  show  that  in  the  case  of  estimating  vari¬ 
ances  of  functions  of  sample  median,  the  modified  bootstrap  estimator  significantly  out¬ 
performs  the  original  bootstrap  estimator  and  the  estimator  based  on  interquartile  range  of  the 
bootstrap  distribution  in  terms  of  finite-sample  sampling  properties. 


2.  THE  MODIFIED  BOOTSTRAP  ESTIMATOR 
2.1.  A  Counter-example 

The  following  example  shows  that  the  bootstrap  estimator  (1.1)  may  be  inconsistent. 

We  consider  the  univariate  case.  Let  F  be  a  univariate  distribution  function  satisfying 
Fix)  =  l-x“*  if  x>10  and  Fix)  =  lx  T*  if  x<-10,  where  A  is  a  constant.  Thus,  F  has  finite 
5th  moment  for  any  s<h.  In  particular,  F  has  finite  second  moment  if  h>2.  Let  t>h  be  a 
constant  and  g(x)  =  exp(x').  Following  the  proof  in  Ghosh  et  al.  (1984,  Example),  the 

bootstrap  variance  estimator  for  the  case  where  §  is  either  g(X)  or  giQ)  (with  0<p<l)  is 
inconsistent  if 

(2.1) 

where  X(„)  =  maxQi |,...,  X„).  In  fact,  under  (2.1),  nViar* (§*)->  oo  a.s. 

To  show  (2.1),  note  that  for  any  M  >0, 

Fr  n-"^*[g(X(„))]2  <  Af  ;  S Ff  <  nog(M  ; 
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for  large  n .  Thus,  (2.1)  follows  from  the  Borel-Cantelli  lemma. 


2.2.  A  Modification 

The  above  example  shows  that  the  bootstrap  variance  estimator  may  diverge  to  infinity 

A 

while  the  asymptotic  variance  of  6  is  finite.  The  inconsistency  of  the  bootstrap  estimator  is 

caused  by  the  fact  that  ll§*-4ll  may  take  some  exceptionally  large  values,  where 

llx  II  =  for  any  vector  x.  A  remedy  is  to  truncate  6*-0  at  some  value.  Throughout  the 

paper,  the  yth  components  of  and  6  are  denoted  by  qJ  and  0^-,  respectively.  Let 
x(3l  )=x(Xi,...,  be  a  k -vector  of  functions  of  data  satisfying 

Xj^Cq  and  Xj=0(l)  a.s.  j=\,...Jc,  (2.2) 

where  Xj  is  the  jth  component  of  xQC)  and  Cq  is  a  fixed  constant  A  modified  bootstrap  esti¬ 
mator  of  £  is 


where  A*=  (  A*,...,A*  )'  and 


=nVflr,(A*), 


Xj  if  §y*-4y  >Xj 

if  \§J-^j\^Xj 
-^j  if  ^*-^j<^j 


(2.3) 


(2.4) 


In  the  following  we  establish  the  consistency  of  the  modified  bootstrap  estimator 
under  some  weak  conditions.  Choices  of  the  function  xQL )  are  discussed  in  Section  2.4. 


2.3.  Consistency  of  the  Modified  Bootstrap  Estimator 

Let  F  be  a  ib -variate  distribution  function,  =  EYj,  0  =  g(li),  §  =  g(X)  and  Vg  be  the 
gradient  ofg.  If£IIXill^<oo  and  Vg  is  continuous  in  a  neighborhood  of  p,  then  as  n  ->«», 
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(2.5) 


inlaw, 

where  Z  has  an  m  -variate  normal  distribution  with  mean  zero  and  covariance  matrix 

I=V^ai)Var(X,)(V^(^i))'. 

The  proof  of  the  following  theorem  is  given  in  the  Appendix. 

Theorem  1.  Assume  that  £  IIXjll^  <  oo  and  g  is  continuously  differentiable  in  a  neighbor¬ 
hood  of  p.  Then  the  modified  bootstrap  estimatOT  (defined  in  (2.2)-(2.4))  is  consistent,  i.e., 
as  n— >00, 

For  the  sample  quantiles,  we  consider  univariate  F.  Let  the  yth  component  of  p  be  Q(Pj) 

(Pj-quantile  of  F),  0<py <1,  Q  =  g  (p),  0  =  g (0,  and  E^  be  defined  in  (2.2)-(2.4).  It 

is  well  known  that  n'^(§-0)  converges  in  law  to  an  m -variate  normal  distribution  with  mean 
zero  and  covariance  matrix 

E=Vg(p)A(Vg(p))',  (2.6) 

where  A  is  a  symmetric  matrix  whose  (/ j)th  element  is 

\j  =Piil-Pj)/[fiQ(Pi))f(Q(Pj))l 
f  (Q  (Pi))  is  the  derivative  of  F  at  Q  (p,)  and  is  assumed  to  be  positive. 

We  have  the  following  result  (the  proof  is  in  the  Appendix). 

Theorem  2.  Assume  that  F  is  differentiable  at  Q(pj)  with  f(Q(Pj))>0  and  0<py<l, 
where/  is  the  derivative  of  F.  Assume  also  that  F[log(l-i-IXj  I)]  <  <»  and  g  is  con¬ 
tinuously  differentiable  in  a  neighborhood  of  p  =  (  G(Pi),...,  Q(p*) )'.  Then 

tig  -*Ii  as. 
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2.4.  Some  Practical  Issues 


The  modified  bootstrap  estimator  is  consistent  (under  the  weak  conditions  in  Theorems 
1  and  2)  for  any  function  t(X)  satisfying  (2.2).  Two  choices  of  the  function  t(X)  for  practical 
uses  are  suggested  as  follows. 

(1)  TyH  a  constant  This  can  be  used  when  one  has  some  rough  information  about  the  asymp¬ 
totic  variance  of  §y .  For  example,  the  asymptotic  variance  is  unknown  but  bounded  by  a  posi¬ 
tive  constant  C .  Then  Xj  can  be  chosen  to  be  any  constant  x  > 

A 

(2)  Xj=  max(p  1 0y  I ,  Cq)  for  a  small  positive  constant  Cq  and  a  positive  constant  p.  Clearly  this 

Xj  satisfies  (2.2)  if  0  is  strongly  consistent  The  small  constant  Cq  is  used  to  prevent  Xj 

approaching  zero.  With  this  choice  of  Xj,  •0/-0y  I  is  replaced  by  Xj  when  the  ratio  qJ/Bj 

differs  from  one  by  more  than  ±100p%.  A  simulation  study  of  the  performance  of  with 
this  choice  of  Xj  is  given  in  Section  3. 

For  numerical  evaluation  of  the  bootstrap  estimator,  Efron  (1979)  proposed  the  use  of  the 
Monte  Carlo  approximation.  The  same  idea  can  be  used  here  for  the  evaluation  of  the 
modified  bootstrap  estimator.  That  is,  we  generate  i.i.d.  samples  Xj^,...,  from 
{  Xi,...,  and  calculate  A**  (based  on  X**’,...,  X*^)  according  to  (2.4).  Then  use 

to  approximate  Var*  (A* ). 


2.5.  Comparison  with  the  estimator  based  on  bootstrap  quantiles 

Consider  the  situation  where  0  is  a  scalar  (m=l).  Let  a  be  a  constant  between  0  and  Vz. 
Then  the  following  estimator  of  the  asymptotic  variance  of  n'^’(0-0)  is  consistent; 

X^  =  [//-^l-a)  -  //-*(a)]/[<I>-Hl-a)  -  <I>-^(a)], 

where  O  is  the  standard  normal  distribution,  H{x)  =  Pt,{  /i'^’(0*-0)^  },  and  <l>“kfl)  and 

are  the  a -quantile  of  and  H,  respectively.  An  example  is  0=1/4  and  X^  is  a  multi¬ 
ple  of  the  interquartile  range  of  the  bootstrap  distribution  H. 
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Although  is  consistent  and  therefore  asymptotically  equivalent  to  the  modified 

A  A 

bootstrap  estimator  ,  the  computation  of  for  any  fixed  sample  size  is  easier  and  cheaper 

A 

than  that  of  Z^ ,  since  the  former  involves  the  computation  of  the  second  order  moment  of  the 
bootstrap  distribution  H  whereas  the  latter  involves  the  computation  of  the  quantiles  of  H. 

Usually  Z^  and  Z^  have  to  be  approximated  by  Monte  Carlo  (see  Section  2.4).  Obtaining  an 
accurate  Monte  Carlo  approximation  of  the  second  order  moment  of  the  bootstrap  distribution 
H  is  much  easier  than  obtaining  an  accurate  Monte  Carlo  approximation  of  the  quantiles  of  H . 
It  was  shown  (Efron,  1987,  Section  9)  that  the  Monte  Carlo  approximation  of  the  second  order 
moment  of  H  usually  requires  100-200  bootstrap  replications.  On  the  other  hand,  the  Monte 
Carlo  approximation  of  a  quantile  of  H  is  more  costly,  requiring  1(XX)~2(X)0  bootstrap  replica- 

A  A 

tions.  The  amoimt  of  computation  required  for  Z^  is  at  least  10  times  as  much  as  that  for  Z^ . 

A  A 

For  the  same  bootstrap  replication  size  B,  Z^  is  much  less  accurate  than  Z^  and  is  also 
less  acciu^te  than  when  is  consistent  This  is  shown  in  the  following  simulation  study. 


3.  A  SIMULATION  STUDY 

In  this  section  we  study  by  simulation  the  finite-sample  sampling  properties  of  the 
modified  bootstrap  estimator,  the  original  bootstrap  estimator  and  the  estimator  based  on 
bootstrap  interquartile  range  in  the  case  of  estimating  the  asymptotic  variances  of  functions  of 
sample  median. 

Let  (2  be  the  sample  median  based  on  n=36  i.i.d.  samples  from  a  distribution  F  and 

^g(Q).  Three  functions  g  are  considered;  (i)  g(x)=x;  (ii)  g(x)=x^/4;  (iii)  g(x)=e^/4.  Two 
distributions  F  under  consideration  are:  (i)  normal  distribution  with  median  (mean)  1.5  and 
standard  deviation  2;  (ii)  Cauchy  distribution  with  median  1.5  and  scale  parameter  2. 

A 

The  function  t(X)  for  the  modified  bootstrap  estimator  is  chosen  to  be  maxCA  161, 0.05). 
For  the  evaluation  of  the  three  bootstrap  estimators,  Monte  Carlo  approximation  of  size  5=500 
is  used  (see  Section  2.4). 
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Table  1  reports  the  root  mean  squared  errors  (rmse)  and  the  biases  of  the  three  bootstrap 
estimators.  The  asymptotic  variances  (denoted  by  o^)  are  included.  All  the  results  are  based 
on  20(X)  simulations  on  a  VAX  1 1/780  at  Purdue  University.  The  IMSL  subroutines  are  used 
for  generating  random  numbers. 

We  summarize  the  simulation  results  as  follows. 

(1)  Overall.  All  three  bootstrap  variance  estimators  are  up- ward  biased.  The  modified 
bootstrap  estimator  reduces  the  bias  considerably,  hi  terms  of  the  rmse,  the  modified  bootstrap 
significantly  out-performs  the  original  bootstrap  and  the  bootstrap  interquartile  range.  The 
ratio  of  the  rmse  of  the  modified  bootstrap  estimator  to  the  rmse  of  the  original  bootstrap  esti¬ 
mator  (or  the  bootstrap  interquartile  range),  denoted  by  /? ,  is  shown  in  Table  1. 

(2)  The  modified  bootstrap  and  the  original  bootstrap.  The  improvement  of  the  modified 
bootstrap  over  the  original  bootstrap  is  larg^  if  the  distribution  F  has  heavier  tails  and/or  the 
function  g(x)  has  a  faster  rate  of  divergence  (as  Ijc  I  ->«»).  This  indicates  that  even  if  the  origi¬ 
nal  bootstrap  estimator  is  consistent,  the  modified  bootstrap  estimator  may  have  a  faster  con¬ 
vergence  rate. 

(3)  The  modified  bootstrap  and  the  interquartile  range.  With  the  same  bootstrap  replication 
number  fi^SOO,  the  modified  bootstrap  is  much  more  efficient  than  the  bootstrap  interquartile 
range:  the  ratio  R  is  usually  about  0.S--0.6.  In  fact,  the  bootstrap  interquartile  range  is  also 
not  as  good  as  the  original  bootstrap  estimator  in  the  case  where  the  original  bootstrap  estima¬ 
tor  is  consistent 

(4)  The  effects  of  distribution  teals  and  function  g.  The  case  of  F  =  Cauchy  distribution  and 
g  (z)  =  c*/4  is  an  exceptional  case:  the  original  bootstrap  estimator  is  inconsistent  (diverges  to 
infinity)  and  the  biases  and  rmse  of  the  other  two  estimators  are  also  very  large.  This  indicates 
that  although  the  modified  bootstrap  and  bootstrap  interquartile  range  estimators  are  consistent, 
the  sample  size  n=36  is  not  large  enough  when  the  distribution  F  has  heavy  tails  and  g(x) 
diverges  to  infinity  at  a  very  fast  rate.  However,  the  result  in  Table  1  still  clearly  shows  that 
the  modified  bootstrap  estimator  is  much  better. 
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APPENDIX 


Proof  of  Theorem  1.  From  Bickel  and  Freedman  (1981),  for  almost  all  X^,  X2,...,  the  condi¬ 
tional  distribution  of  n'^(§*-0)  converges  to  the  distribution  of  Z  (given  in  (2.5)).  Let 
Xj,  X2,...  be  a  fixed  sequence  such  that  (2.2)  holds  and  the  conditional  distribution  of 

\i  A 

n  '*(0  -6)  converges  to  the  distribution  of  Z.  Let  P*  be  the  bootstrap  conditional  probability 
and  X.  be  an  arbitrary  nonzero  m  -vector.  For  any  fixed  t  >0, 

I  P*{  i»V(0*-e)(§*-4)'  X<t  }-P,{  nXXA*A*')X  <t  }  \ 

^1-P,{  I0*-Oy  I  <  }  -*0 

as  Therefore  the  conditional  distribution  of  «(A*A*')  converges  to  the  distribution  of 

ZZ'.  It  remains  to  show  that  there  is  a  constant  5>0  such  that 

£.  (n (I  A*  II  =  0(1)  a (Al) 

We  now  show  that  (Al)  holds  with  5=2.  Since  Vg  is  continuous  in  a  neighborhood  of  jj., 
there  are  positive  constants  r\  and  M  such  that 

trace  { [Vg  (x  )l'(Vg  (x )] }  ;S  M  if  llx-fi  II  2t|. 

By  the  strong  law  of  large  numbers,  almost  surely, 

X  and  n-^X^^^(XrX)(Xi-Xy  Var(Xi).  (A2) 

Let  Xij  and  Xj  be  the  yth  components  of  X,-  and  X,  respectively.  By  the  Marcinkiewicz’s 
strong  law  of  large  numbers,  almost  surely, 

n~'^'L^^^(Xij-Xjt  ^  16n-22;^,(X^-£X^)^  -^0  for  all  y=l,...yfe.  (A3) 

Let  Xj,  X2,...  be  a  sequence  such  that  (2.2),  (A2)  and  (A3)  hold.  Then  IIX-^II  ^  T]  for  large 
n.  Let  /  (A  )  be  the  indicator  function  of  the  set  A .  Then 

/»%•  BA*  8^^  =  «%•  HA*  B^/(IIX*-X  II ^Ti)  +  n'^E,  HA*  ll'*/(IIX*-X  ll>Ti) 

^  ri^En  II0*-^II^/(IIX*-X  II^Ti)  IIt(X)II^/i2e,/(IIX‘-X  ll>Tl) 

=  n'^En  II Vg(^*)(X*-X)llV(IIX*-X  ll^q)  +  llx(X)ll'‘/i2£,/(||X*-X  II >ti)  (A4) 
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^  IIX*  -X  11  ( IIX*  -X  11  STl)  +  TI-4  n  xQC  )  II IIX*  -X  II  ^  (A5) 

^  (Af2+Ti-4||x(X)llV2E*  IIX*-X  11^ 

where  (A4)  follows  from  the  mean-value  theorem  and  is  a  point  on  the  line  segment 
between  X*  and  X ,  and  (A5)  follows  from 

ll^*-Hll  ^  IIX-pll  -I-  ll^•-XII  <Ti+  IIX*-XII. 

Under  (2.2),  llT(X)  II  =  0(1).  Hence  the  result  follows  from 

(X/-X*  f  =  0(1),  (A6) 

where  Xj  is  the  y  th  component  of  X* .  A  straightforward  calculation  shows  that 
n^E,iX;-X*f  =  n-2x,"^j(X^-Xy)^  +  3(n-2-n-3)[X"^^(Xy-Xp2]2 
Hence  (A6)  follows  from  (A2)-(A3)  and  thus  the  result, 

Proof  of  Theorem  2.  From  Bickel  and  Freedman  (1981),  for  almost  all  Xj,  X2,...,  the  condi¬ 
tional  distribution  of  /i'^(6*-e)  converges  to  the  normal  distribution  with  mean  zero  and 
covariance  matrix  given  by  (2.6).  Following  the  same  argument  in  the  proof  of  Theorem  1,  we 
only  need  to  show  (Al). 

Replacing  X*  and  X  by  Q*  and  C  in  the  proof  of  Theorem  1,  we  have 

n^E^  II  A*  II ^  C II G* -Q II \  (A7) 

where  Cl  is  a  positive  constant  Then  (Al)  follows  from  (A7)  and 

ri^E*\\Q*-Q\\^  =  0{X)  a.s. 
under  E[log(l+ 1X1  !)]<«»  (see  Babu,  1986).  ^ 
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T^ble  1.  Results  of  simulation  comparison  of  the  modofied  bootstrap,  the  original  boot¬ 
strap  and  the  interquartile  range  estimators. 


Normal  distribution 

Modified  bootstrap 

Oripiial  bootstrap 

Interquartile  range 

bias 

rmse 

bias 

rmse 

Rt 

bias 

rmse 

R^ 

X 

0.1745 

0.0108 

0.0985 

0.0150 

0.1049 

0.9390 

0.1747 

0.5638 

x*/4 

0.0982 

0.0070 

0.0722 

0.0193 

0.0921 

0.7839 

0.0194 

0.1361 

0.5305 

0.2191 

0.1252 

0.3330 

0.2211 

0.5908 

0.5636 

0.5724 

0.5818 

Cauchy  distribution 

Modified  bootstrap 

Oripnal  bootstr^ 

Interquartile  range 

y(*) 

c*/n 

bias 

rmse 

bias 

rmse 

Rt 

bias 

rmse 

Rt 

X 

0.2742 

0.0617 

0.1928 

0.1037 

0.2605 

g 

0.5516 

*’/4 

0.1542 

^^3 

0.1747 

0.1302 

0.4320 

0.4843 

eV4 

0.3442 

0.5556 

1.6558 

1.03  X  10* 

4.56  X  10* 

0.8112 

8.3566 

0.1981 

tn  rmse  of  modified  bootstrap 
•R  —  fmic 
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